Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk
نویسندگان
چکیده
We present in this paper the first empirical comparison of SDC methods for microdata which encompasses both continuous and categorical microdata. Based on re-identification experiments, we try to optimize the tradeoff between information loss and disclosure risk. First, relevant SDC methods for continuous and categorical microdata are identified. Then generic information loss measures (not targeted to specific data uses) are defined, both in the continuous and the categorical case. Disclosure risk is assessed using empirical re-identification. Two approaches to empirical re-identification are used: Euclidean record linkage and probabilistic record linkage. The results of this comparison will be used to come up with better SDC for microdata in the recently started EU-funded project CASC.
منابع مشابه
A Quantitative Comparison of Disclosure Control Methods for Microdata
As described in Chapter 5, there is a plethora of statistical disclosure control (SDC) methods to protect microdata. This chapter provides guidance in choosing a particular SDC method by comparing some of the methods discussed in Chapter 5 on the basis of both information loss and disclosure risk. Information loss can be readily quantified using analytical measures (either generic or data-use-s...
متن کاملStatistical Disclosure Control Methods for Census Frequency Tables
This paper provides a review of common statistical disclosure control (SDC) methods implemented at Statistical Agencies for standard tabular outputs containing whole population counts from a Census (either enumerated or based on a register). These methods include record swapping on the microdata prior to its tabulation and rounding of entries in the tables after they are produced. The approach ...
متن کاملDisclosure Control Methods and Information Loss for Microdata
Statistical disclosure control (SDC) seeks to modify statistical data so that they can be published without giving away confidential information that can be linked to specific respondents. The challenge for SDC is to achieve this modification with minimum loss of the detail and accuracy sought by database users. SDC methods for microdata are usually known as masking methods, of which there is a...
متن کاملStatistical Disclosure Control for Microdata Using the R-Package sdcMicro
The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units. Confidentiality can be achieved by applying sta...
متن کاملA Survey of Inference Control Methods for Privacy-Preserving Data Mining
Inference control in databases, also known as Statistical Disclosure Control (SDC), is about protecting data so they can be published without revealing confidential information that can be linked to specific individuals among those to which the data correspond. This is an important application in several areas, such as official statistics, health statistics, e-commerce (sharing of consumer data...
متن کامل